binary: fix string parsing on big-endian hosts#6605
Open
amaanq wants to merge 2 commits intoKhronosGroup:mainfrom
Open
binary: fix string parsing on big-endian hosts#6605amaanq wants to merge 2 commits intoKhronosGroup:mainfrom
amaanq wants to merge 2 commits intoKhronosGroup:mainfrom
Conversation
37f786b to
9103f43
Compare
When parsing a SPIR-V binary with a different endianness than the host (e.g. a spec-conformant little-endian binary on ppc64/s390x), the parser reads string operands from raw `_.words` without byte-swapping first. `MakeString` then extracts bytes assuming native word layout, producing garbled strings, for example, "OpenCL.std" reads as "nepOs.LC". Byte-swap the words before passing them to `MakeString` when `requires_endian_conversion` is true, matching how other operand types are already handled via `spvFixWord`.
9103f43 to
cafde04
Compare
`extract_source.cpp` used `reinterpret_cast<const char*>` on parsed instruction words to read string data. On big-endian hosts, bytes within each native-endian word are in high-to-low memory order, but SPIR-V strings pack characters starting from the lowest byte of each word. Use `MakeString` instead of raw casts, which correctly extracts characters from the low bits of each word regardless of host endianness.
cafde04 to
133e93b
Compare
Collaborator
|
Thanks for your patience awaiting my review. Hi, please take a look at the analysis and experiments at #5302 (comment) There I show that the binary parser handles both big-endian and little-endian binaries. I haven't looked at source extraction. Please provide examples showing there is a problem, and also provide tests with any suggested fixes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
The fixes are split into individual commits to make reviewing easier
Problem
Two places in the codebase read SPIR-V string data incorrectly on big-endian hosts:
binary.cpp: When parsing a SPIR-V binary with a different endianness than the host (e.g. a spec-conformant little-endian binary on ppc64/s390x), the parser reads string operands from raw_.wordswithout byte-swapping first.
MakeStringthen extracts bytes assuming native word layout, producing garbled strings — e.g."OpenCL.std"reads as"nepOs.LC".extract_source.cpp: The objdump source extraction usedreinterpret_cast<const char*>on parsed instruction words, which gives wrong byte order on big-endian hosts since SPIR-V strings packcharacters starting from the lowest byte of each word.
Solution
Byte-swap the words before passing them to
MakeStringwhenrequires_endian_conversionis true, matching how other operand types are already handled viaspvFixWord.Use
MakeStringinstead of raw casts, which correctly extracts characters from the low bits of each word regardless of host endianness.